Search CORE

11 research outputs found

Recommended from our members

Conspiracy in the Time of Corona: Automatic detection of Emerging Covid-19 Conspiracy Theories in Social Media and the News

Author: Holur Pavan
Roychowdhury Vwani
Shahsavari Shadi
Tangherlini Timothy R
Wang Tianyi
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Abstract Rumors and conspiracy theories thrive in environments of low confi- dence and low trust. Consequently, it is not surprising that ones related to the Covid-19 pandemic are proliferating given the lack of scientific consensus on the virus’s spread and containment, or on the long term social and economic ramifications of the pandemic. Among the stories currently circulating are ones suggesting that the 5G telecommunication network activates the virus, that the pandemic is a hoax perpetrated by a global cabal, that the virus is a bio-weapon released deliberately by the Chinese, or that Bill Gates is using it as cover to launch a broad vaccination program to facilitate a global surveillance regime. While some may be quick to dismiss these stories as having little impact on real-world behavior, recent events including the destruction of cell phone towers, racially fueled attacks against Asian Americans, demonstrations espousing resistance to public health orders, and wide-scale defiance of scientifically sound public mandates such as those to wear masks and practice social distancing, countermand such conclusions. Inspired by narrative theory, we crawl social media sites and news reports and, through the application of automated machine-learning methods, discover the underlying narrative frame- works supporting the generation of rumors and conspiracy theories. We show how the various narrative frameworks fueling these stories rely on the alignment of otherwise disparate domains of knowledge, and consider how they attach to the broader reporting on the pandemic. These alignments and attachments, which can be monitored in near real-time, may be useful for identifying areas in the news that are particularly vulnerable to reinterpretation by conspiracy theorists. Understanding the dynamics of storytelling on social media and the narrative frameworks that provide the generative basis for these stories may also be helpful for devising methods to disrupt their spread

eScholarship - University of California

Novel scaling law governing stock price dynamics

Author: Holur Pavan
Miyahara Hideyuki
Qian Hai
Roychowdhury Vwani
Publication venue
Publication date: 24/12/2022
Field of study

A stock market is typically modeled as a complex system where the purchase, holding or selling of individual stocks affects other stocks in nonlinear and collaborative ways that cannot be always captured using succinct models. Such complexity arises due to several latent and confounding factors, such as variations in decision making because of incomplete information, and differing short/long-term objectives of traders. While few emergent phenomena such as seasonality and fractal behaviors in individual stock price data have been reported, universal scaling laws that apply collectively to the market are rare. In this paper, we consider the market-mode adjusted pairwise correlations of returns over different time scales (

\tau

c_{i,j}(\tau)

, and discover two such novel emergent phenomena: (i) the standard deviation of the

c_{i,j}(\tau)

's scales as

\tau^{-\lambda}

, for

\tau

larger than a certain return horizon,

\tau_0

, where

\lambda

is the scaling exponent, (ii) moreover, the scaled and zero-shifted distributions of the

c_{i,j}(\tau)

's are invariant of

\tau > \tau_0

. Our analysis of S\&P500 market data collected over almost

20

years (

2004-2020

) demonstrates that the twin scaling property holds for each year and across

2

decades (orders of magnitude) of

\tau

. Moreover, we find that the scaling exponent

\lambda

provides a summary view of market volatility: in years marked by unprecedented financial crises -- for example

2008

and

2020

-- values of

\lambda

are substantially higher. As for analytical modeling, we demonstrate that such scaling behaviors observed in data cannot be explained by existing theoretical frameworks such as the single- and multi-factor models. To close this gap, we introduce a promising agent-based model -- inspired by literature on swarming -- that displays more of the emergent behaviors exhibited by the real market data.Comment: 45 page

arXiv.org e-Print Archive

Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Author: Holur Pavan
Roychowdhury Vwani
Shahsavari Shadi
Tangherlini Timothy
Wang Tianyi
Publication venue
Publication date: 01/01/2022
Field of study

Social media is a breeding ground for threat narratives and related conspiracy theories. In these, an outside group threatens the integrity of an inside group, leading to the emergence of sharply defined group identities: Insiders -- agents with whom the authors identify and Outsiders -- agents who threaten the insiders. Inferring the members of these groups constitutes a challenging new NLP task: (i) Information is distributed over many poorly-constructed posts; (ii) Threats and threat agents are highly contextual, with the same post potentially having multiple agents assigned to membership in either group; (iii) An agent's identity is often implicit and transitive; and (iv) Phrases used to imply Outsider status often do not follow common negative sentiment patterns. To address these challenges, we define a novel Insider-Outsider classification task. Because we are not aware of any appropriate existing datasets or attendant models, we introduce a labeled dataset (CT5K) and design a model (NP2IO) to address this task. NP2IO leverages pretrained language modeling to classify Insiders and Outsiders. NP2IO is shown to be robust, generalizing to noun phrases not seen during training, and exceeding the performance of non-trivial baseline models by

20\%

.Comment: ACL 2022: 60th Annual Meeting of the Association for Computational Linguistics 8+4 pages, 6 figure

arXiv.org e-Print Archive

eScholarship - University of California

Embed-Search-Align: DNA Sequence Alignment using Transformer Models

Author: Bouchard Louis-S.
Enevoldsen K. C.
Georgiou Thalia
Holur Pavan
Mboning Lajoyce
Pellegrini Matteo
Roychowdhury Vwani
Publication venue
Publication date: 20/09/2023
Field of study

DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.Comment: 17 pages, Tables 5, Figures 5, Under review, ICL

arXiv.org e-Print Archive

An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com

Author: Bandari Roja
Ebrahimzadeh Ehsan
Falahi Misagh
Holur Pavan
Roychowdhury Vwani
Shahbazi Behnam
Shahsavari Shadi
Tangherlini Timothy R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/04/2020
Field of study

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Mapping dreams in a computational space

Author: Bulkeley Kelly
Gutman Maja
Holur Pavan
Publication venue: 'Elsevier BV'
Publication date: 27/10/2022
Field of study

This article demonstrates that an automated system of linguistic analysis can be developed – the Oneirograph – to analyze large collections of dreams and computationally map their contents in terms of typical situations involving an interplay of characters, activities, and settings. Focusing the analysis first on the twin situations of fighting and fleeing, the results provide densely detailed empirical evidence of the underlying semantic structures of typical dreams. The results also indicate that the Oneirograph analytic system can be applied to other typical dream situations as well (e.g., flying, falling), each of which can be computationally mapped in terms of a distinctive constellation of characters, activities, and settings

Digital repository of Slovenian research organizations

Modelling social readers: novel tools for addressing reception from online book reviews

Author: Ebrahimzadeh Ehsan
Holur Pavan
Roychowdhury Vwani
Shahsavari Shadi
Tangherlini Timothy R
Publication venue: eScholarship, University of California
Publication date: 01/12/2021
Field of study

Social reading sites offer an opportunity to capture a segment of readers' responses to literature, while data-driven analysis of these responses can provide new critical insight into how people 'read'. Posts discussing an individual book on the social reading site, Goodreads, are referred to as 'reviews', and consist of summaries, opinions, quotes or some mixture of these. Computationally modelling these reviews allows one to discover the non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit sequencing of various subplots and readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader-generated shared narrative model. Using a corpus of reviews of five popular novels, we discover readers' distillation of the novels' main storylines and their sequencing, as well as the readers' varying impressions of characters in the novel. In so doing, we make three important contributions to the study of infinite-vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from reviews, and (iii) an 'impressions' algorithm, SENT2IMP, that provides multi-modal insight into readers' opinions of characters

PubMed Central

eScholarship - University of California

Recommended from our members

Conspiracy in the Time of Corona: Automatic detection of Emerging Covid-19 Conspiracy Theories in Social Media and the News

Author: Holur Pavan
Roychowdhury Vwani
Shahsavari Shadi
Tangherlini Timothy R
Wang Tianyi
Publication venue: eScholarship, University of California
Publication date: 04/08/2020
Field of study

eScholarship - University of California